Yimeng Zhang\* Michigan State University, USA

Zhiwen Fan\* University of Texas at Austin, USA

> Shiyu Chang UC, Santa Barbara, USA

Akshay Karkal Kamath\* Georgia Institute of Technology, USA

Wuyang Chen University of Texas at Austin, USA

Sijia Liu Michigan State University, USA Qiucheng Wu\* UC, Santa Barbara, USA

Zhangyang Wang University of Texas at Austin, USA

Cong Hao Georgia Institute of Technology, USA







#### Background

• The efficiency of "compressed" models are evaluated **without considering the practical hardware platform**, such as low-power FPGAs.













#### Background

- The efficiency of "compressed" models are evaluated **without considering the practical hardware platform**, such as low-power FPGAs.
- existing accelerators are evaluated on the ImageNet dataset with small input image sizes and do not scale to real-world High-Definition (HD) video frames.





#### Background

- The efficiency of "compressed" models are evaluated **without considering the practical hardware platform**, such as low-power FPGAs.
- existing accelerators are evaluated on the ImageNet dataset with small input image sizes and do not scale to real-world High-Definition (HD) video frames.
- Multi-Object Tracking (MOT) is the focus.





















Method

MICHIGAN STATE

UNIVERSITY





(a) Input frame



(e) FM(2,0)



(b) Saliency Mask



(f) FM(2,0) w/ Mask



(c) Masked Input



(g) FM(3,0)













Patch Drop for Input Frame and intermediate features







Method

MICHIGAN STATE

UNIVERSITY





- **pre-define** irregular sparse patterns for 3 × 3 kernels
- leverage them to conduct irregular but pattern-aware weight pruning



**Method** 





Frame 4

Backbone

Hardware

Design

Frame 2 dropped

Frame 3

RPN+FPN

Frame 1

backend

















| = Methods                     | Data/model compression |         | Metrics  |          |                        |               |                      |                                  |
|-------------------------------|------------------------|---------|----------|----------|------------------------|---------------|----------------------|----------------------------------|
|                               | Data reduction         | Pruning | IDF1 (†) | MOTA (↑) | Latency $(\downarrow)$ | EFR (↑)       | Power $(\downarrow)$ | Energy Efficiency $(\downarrow)$ |
| <b>QDTrack</b> (GPU baseline) | ×                      | ×       | 0.714    | 0.637    | 60.9                   | 22.5          | 296 W                | 13.2 J/frame                     |
| QDTrack on FPGA               | ×                      | ×       | 0.714    | 0.637    | 554.7                  | 1.8           | 50.8 W               | 28.2 J/frame                     |
| Variant: Frame + patch drop   | (40%, 20%)             | ×       | 0.71     | 0.628    | 443.8                  | 2.3           | 50.8 W               | 22.0 J/frame                     |
| Tri-design (ours)             | (40%, 20%)             | 90%     | 0.704    | 0.617    | 44.4                   | 37.6          | 50.8 W               | 1.35 J/frame                     |
| Improv. over GPU baseline     | —                      | _       | -1.40%   | -3.14%   | 1.37×                  | <b>1.67</b> × | 5.83×                | 9.78×                            |
| Improv. over FPGA baseline    | _                      | _       | -1.40%   | -3.14%   | 12.5×                  | <b>20.9</b> × | —                    | <b>20.9</b> ×                    |
|                               |                        |         | 1.1070   | 5.11/0   | 12.07                  |               |                      | 20.77                            |

#### **Implementation Details**

- 40% temporal **frame** dropping
- 20% spatial **patch** dropping
- 90% model pruning

Hardware Metrics given by on-board latency in the unit of millisecond

- effective frame rate (**EFR**)  $\rightarrow$  FPS
- power in the unit of Watt

#### Accuracy

- ID F1 Score (**IDF1**)
- Multi-Object Tracking Accuracy (MOTA)







